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Abstract 



In this article we present some statistical applications of the functional singular 
value decomposition (FSVD). This tool allows us to decompose the sample mean 
of a bivariate stochastic process into components that are functions of separate 
variables. These components are sometimes interpretable functions that summa- 
rize salient features of the data. The FSVD can be used to visually detect outliers, 
to estimate the mean of a stochastic process or to obtain individual smoothers of 
the sample surfaces. As estimators of the mean, we show by simulation that FSVD 
estimators are competitive with tensor-product splines in some situations. 

Key Words: Eigenvalues and eigenfunctions; Functional data analysis; Outlier 
detection; Principal component analysis; Spectral decomposition; Spline smooth- 
ing. 



1 Introduction 



The analysis of samples of curves has become more common in statistical applica- 
tions in recent years. In many applications, the data consists of discrete realiza- 
tions of a univariate process, say X(t), where t can be time (e.g. growth curves in 
Gasser et al., 2004), distance (e.g. biomarker expression curves in Morris and Car- 
roll, 2006) or age (e.g. income distribution densities in Kneip and Utikal, 2001), 
among other possibilities. More examples and statistical methodology can be found 
in Ramsay and Silverman (2002, 2005) or Ferraty and Vieu (2006). 

Multivariate stochastic processes, on the other hand, have received less atten- 
tion. By multivariate process we mean a real-valued process X(s) that is a function 
of a multidimensional variable s. They are also known as random fields (Adler 
and Taylor, 2007). Although they are less common in statistics than univariate 
processes, they play an important role in fMRI studies and spatial statistics (Taylor 
and Worseley, 2007; Nychka, 2000). In these applications s is a point in R 2 or M 3 . 
However, in other situations the variables do not belong to a single natural space. 
For example, X(s,t) may be the mortality rate for individuals of age s during year t 
in a given country, or the outcome of a multichannel electroencephalography study 
where t is time and s is the location of the electrode on the scalp. It is clear that 
the variables s and t belong to different spaces; although the product space could 
be regarded as a single space, this would be more a mathematical formalization 
than a natural structure implied by the data. 

To understand more clearly the problems involved, in Fig. [T] we have plotted 
the sample mean of log-mortality rates for ten European countries. The raw mean 
shows some irregularities due to random noise. To regularize a bivariate estimator 
like this, one would normally employ a smoothing method based on splines (Gu, 
2000) or kernels (Hardle and Muller, 2000). However, those global smoothers will 
most likely level off important features of the data, like the increased mortality 
rates during the Second World War, which are sharp but localized features. 

In this paper we present a different approach, based on a generalization of the 
singular value decomposition. The basic idea is to approximate a bivariate function 
fi(s, t) with a sum of functions of separate variables, fi^(s, t) = Y7k=i ^l^^ki^kit)' 
where fc and ip k are univariate functional principal components (Silverman, 1996; 
Yao and Lee, 2006; Gervini, 2006). The components are sometimes interpretable 
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Figure 1 : Human Mortality Data. Mean of log-mortality rates for ten European 
countries. 
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functions that summarize important features of the data, and can be used, for ex- 
ample, to detect atypical observations. Individual smoothers of the sample surfaces 
can also be obtained as by-products. The bivariate singular value decomposition 
has been used in image analysis and physics (Dente et al., 1996; Aubry et al., 
1991), under the name of "biorthogonal decomposition". However, these articles 
disregard smoothing issues, using raw principal components for estimation. In 
most statistical applications, that would lead to extremely noisy and uninformative 
estimates. In contrast, the method we present here produces smooth and regular 
estimators. 

This article is organized as follows. The functional singular value decomposi- 
tion (FSVD) is presented in Section [2l and smooth estimators of the components 
are introduced in Section [3l An application to a real dataset in Section [4] illus- 
trates the potential of the FSVD as a graphical tool. In Section [5] we compare by 
simulation the behavior of the FSVD with tensor-product splines as estimators of 
the mean. Abbreviated proofs of the theorems are given in the Appendix; more 
detailed proofs and additional material is available on a Technical Report that will 
be posted on the author's website. 



2 The functional singular value decomposition 

Let X(s,t) be a real-valued stochastic process in L 2 (§ x T) with finite expectation 
/i(s,t) and finite covariance function p{(si,ti), (s 2 ,t 2 )}. We assume that S and T 
are closed intervals in R. Let us define the kernel functions 

h(s 1 ,s 2 ) = J fj l (s 1 ,t)fj,(s 2 ,t) dt 

and 

k 2 (ti,t 2 ) = / /i(s,ti)/i(s,t 2 ) ds. 

We say that e L 2 (§) is an eigenfunction of ki with eigenvalue A if J s ki(s, u)(j)(u)du = 
X(j)(s) for almost every s e S. The eigenfunctions of k 2 are defined in a similar way, 
only that they belong to L 2 (7). The next theorem establishes the existence of a 
decomposition of k\, k 2 and /i in terms of these eigenfunctions. 
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Theorem 1 There exist a non-increasing sequence of positive eigenvalues {\ k } of k\ 
and k 2 , an orthonormal sequence {<f) k } of eigenfunctions of k\ and an orthonormal 
sequence {ip k } of eigenfunctions ofk 2 such that 

h(si, 8 2 ) = x k<j) k (si)(j) k (s2), (1) 

k>l 

k 2 (t 1 ,t 2 ) = J2 x kMti)Mt2) (2) 

k>l 

and 

f x(s,t) = J2^k /2 Ms)Mt)- (3) 

k>l 

The series $2$ and © converge in the sense of the L 2 norm. If in addition p(s, t) 
is continuous, then {4> k } and {i> k } are continuous functions and the convergence of 
(Q]) and d2P 15 absolute and uniform in both variables, with the identities holding for 
each (si, s 2 ) and each (ii, t 2 ). If the right-hand side of ® converges uniformly and 
absolutely, then the identity also holds for every (s, t). 

Theorem Q] implies that the truncated series 

^(s,t) = J2^k /2 Ms)Mt) (4) 

k=l 

converges to p(s,t) in the sense of L 2 (§ x T) as p increases, and that the conver- 
gence is pointwise for every (s,t) if the right-hand side of ([3]) converges uniformly 
and absolutely. The latter occurs if, for instance, the <f) k s and the ip k s are uniformly 
bounded and z2k>i \ * s finite. 

In analogy with the multivariate singular value decomposition, the truncated 
series ^ given by (|4]) provides the best possible approximation of // among linear 
combinations of functions of separate variables, in the sense of the L 2 (§ x T) norm. 

Theorem 2 Let "K p be the class of functions h(s,t) = J2 P k=i a kfk{t)gk{s) with {f k } 
and {g k } orthonormal in L 2 (7) and L 2 (§), respectively. Then 

min llu — /ill 2 = llu — u^\\ 2 , 

with p^ as in ©. 
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The function ,u (p) (s,t) is the sum of p functions of separate variables, d k (s,t) = 

1 /2 

V 0fc( s )V'fe(' t )' tnat we wm can "detail functions". The detail functions are orthog- 
onal in both variables, and = X]/ 2 , so they provide finer levels of detail as k 
increases. An appealing feature of the detail functions is that they are often inter- 
pretable functions, giving us information about the most relevant characteristics of 
the process under investigation. 

Of course, all this would be of little practical use if the computation of the <p k s 
and ip k s required a good preliminary estimator of /i. But we show below that good 
estimators of the eigenfunctions can be obtained from the raw data, and these 
estimators are then used to construct a smooth estimator of /i. 

3 Smooth estimation of the eigenfunctions 

Let Xl, . . . ,X n be an i.i.d. sample of the process X. In most cases, the X { s are 
observed on a discrete grid {sj} x {t k } C S x 7 with random error, so the data 
follows the model 

x ijk = Xi(sj,t k ) + e ijk , i = 1, . . . , n, j = 1, . . . , m, k = 1, . . . , r. (5) 

We will assume that E(e ijk ) = 0, e ijk is independent of X i} e ijk and e V y k i are inde- 
pendent if i ^ i', and E(e ijk eiy k >) = o^bjyb^ (where 5 is Kronecker's delta). 

The simplest estimator of /i at the grid points is the cross sectional mean, 
p,(sj,tk) = J27=i x ijk/ n - Tne corresponding estimators of the kernel functions ki 
and k 2 , using the trapezoid rule for numerical integration, are 

r 

ki(sj,Sj>) = y^u fc /i(sj,tfe)/i(sj/,tfc) 
k=\ 

and 

rn 

h.{tk,t v ) = y^^/f(sj,t fc )/j(g j ,t fc /), 

where m = (t 2 - h)/2, u k = (t k+1 - t k -i)/2, k = 2, . . . , r - I, u r = (t r - t r -i)/2, 
and vx = (s 2 - si)/2, Vj = (s j+1 - Sj_i)/2, j = 2, . . . , m - I, v m = (s m - s m _i)/2. 

From k\ and k 2 we can compute smooth estimators of the eigenfunctions {4> k } 
and {ip k } using spline models (such as B-splines; de Boor, 2001) as follows. We 
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know that 

<\> x = argmax|| fl || =1 // h{si, s 2 )g(si)g(s 2 )ds 1 ds 2 . 

Then, given a spline basis {/3 l7 . . . , (3 q } in L 2 (§), we write g(s) = Yl]=i bj/3j( s ) an d 
define 

bi = argmax{b T S7b : b T rb =1}, 

where % = ff h(si, s 2 )(3 i (s 1 )/3 j (s 2 )ds 1 ds 2 and IV,- = / /3 i (s)/3 j (s)ds. Then ^(s) = 
Y^j=i bij/3j(s) is a spline estimator of the first eigenfunction of k\. 
For the rest of the eigenfunctions we proceed sequentially: since 

(j) k = axgniax{ff ki(s 1 ,s 2 )g(s 1 )g(s 2 )ds 1 ds 2 : ||#|| = 1 and (g,^) = for j < k] , 

we define 

b fc = argmax{b T Ob : b T rb = 1, b T rb, = 0, j < k} (6) 

and set 4> k {s) = X)J=i bkj(3j(s). The corresponding eigenvalues can be estimated by 
A fc = bfefibfe. 

Computationally, © is a very simple problem. Let V = diag(t>i, . . . ,v m ), B e 
K !Xra with Bij = Pi{sj), and Kx e M mxm with = k^sj). Then, using the 
trapezoid rule for numerical integration, Cl = B VK^VB and T = B T VB. If T 1 / 2 
denotes the symmetric square root of T and c k the kth unit-norm eigenvector of 
T-^nr^ 2 , then b fc = v- l / 2 c k . 

If the true eigenfunctions belong to the space generated by the specified spline 
basis, and the eigenvalues of r _1 / 2 f2r _1 ^ 2 (with f2 given below) have multiplicity 
one, then the above estimators are consistent. This is a consequence of the next 
theorem together with the results of Tyler (1981). 

Theorem 3 Let fl e R qXq be given by = ff ki(s u s 2 )(3 i {s 1 )(3 j (s 2 )ds 1 ds 2 . If 
maxfj -» as m — »■ oo and maxu k — >■ as r — > oo, then Ct — »■ Cl in probability 
as n, m and r go to infinity. 

In practice, though, the eigenfunctions may not belong to a spline space. But 
the asymptotic bias will be negligible if the spline basis is appropriately chosen. For 
that reason, in this paper we use adaptive free-knot splines as in Gervini (2006). 
Another possibility is to use a large number of basis functions with global regular- 
ization, as in Silverman (1996), but we prefer the free-knot approach because it 
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provides better fits for the local features of the eigenfunctions. 

Concretely, the algorithm we implemented aggregates knots by maximizing ([6]) 
over a grid of candidates (usually the grid {sj} itself) until there is no significant 
improvement on the objective function ([6]) . Repeated knots are allowed, since they 
provide better resolution of the local features of the components (at the expense 
of fewer degrees of differentiability). The optimal number of knots can be chosen 
either subjectively or by cross-validation. This procedure must be repeated for each 
component because the optimal placement and number of knots changes with each 
component. 

The eigenfunctions {ip k } of k 2 are estimated in a similar way, using a spline 
basis in L 2 (7). Since the choice of sign of the eigenfunctions is always arbitrary, 

-1/2 

care must be taken so that X k = J J jl(s,t)(j) k (s)ip k (t)dsdt is positive. As before, 

-1/2 - - 

we use the trapezoid rule for numerical integration, so X k = fc (s) T VXU^ A .(t), 
where fe (s) is the vector with elements 4> k {sj) and ^ fc (t) is the vector with elements 
^> k (tj); X is the average of the matrices Xj with elements (Xi)j k = x,^ k and U = 
diag(«i, ...,u r ). 

The eigenfunctions are estimated sequentially until a given order p, and then 
we define 

k=l 

The order p must be chosen with care, to reduce bias as much as possible. For 
reasons that will become clearer in Sections [4] and [5l we recommend to use a large 
p as long as the estimators of the eigenfunctions are not overwhelmed by noise, 
even if the corresponding A fe s seem to be negligibly small. 

Interestingly, fi^ can be further decomposed into terms that represent the indi- 

-1/2 

vidual contributions of the X, ; s, since \ k = Yll=i w ik /nwith w ik = fc (s) T VXjU^ fc (t). 
Note that w ik is an estimator of w ik = J J Xi(s,t)4> k (s)tjj k (t)dsdt. Then we can de- 
fine individual predictors of the unobserved sample paths X{(s, t), 

xl p \s,t) = J2^Ms)^ k (t). 

k=l 

The score vectors Wj are useful for exploratory data analysis; for example, they 
may reveal outliers or unusual groupings in the data, as we show by example in 
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Section @J The predictors X- p) can also be used to select the best order p by cross- 
validation. 

4 Example: evolution of human mortality in the 20th 
century 

The socioeconomic progress experienced by western European countries after the 
Second World War is very graphically exemplified by the evolution of human mor- 
tality curves. Mortality rates, which are the percentages of people of certain age 
who die in a given year, can be seen as longitudinal of functional data in two 
senses: for a given year, mortality rates are a function of age; and for each age, the 
evolution of mortality rates over the years are a time series. But a thorough sta- 
tistical analysis must take into account the interplay between these two variables; 
that is, the data must be seen as realizations of a bivariate stochastic process. 

In this section we analyze mortality rates between the years of 1930 and 2000, 
for people ranging from to 90 years of age. The data was downloaded from the 
Human Mortality Database website, www.mortality.org. We only included coun- 
tries of western Europe for which complete data was available: Belgium, Denmark, 
England, Finland, France, Italy, the Netherlands, Norway, Spain and Sweden. For 
country i we defined Xi{s,t) as the logarithm of the mortality rate for age s at year 
t; the data was observed on the grid {0, 1, ... , 90} x {1930, 1931, . . . , 2000}. 

We computed three pairs of eigenfunctions, which are shown in Fig. El The 

-1/2 -1/2 -1/2 

corresponding root-eigenvalues were X 1 = 435.85, A 2 = 11.09 and A 3 = 6.71. 
Clearly, the first eigenvalue is dominant. However, the second and third detail 
functions do improve the fit in ways that are visually noticeable (the fact that 
obvious visual improvements may be associated with very small eigenvalues was 
observed by Dente et al., 1996). 

We see that ^(s) (Fig. [2](a)) can be interpreted as the basic shape of a human 
mortality curve: high infant mortality is followed by a sharp decrease until ado- 
lescence, then a sharp increase occurs that levels off at ages 20 to 30, followed 
by a steady increase from then on. The companion eigenfunction ^(i) (Fig. [2Kb)) 
is the overall mortality trend over this 71-year period: a modest decrease in the 
early 30's was punctuated by the Second World War, followed by a remarkably fast 
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Figure 2: Human Mortality Data. Free-knot spline estimators of the eigenfunctions: 

(a) Ms), (b) fat), (c) fas), (d) fat), (e) Ms) and (f) fat). 
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Figure 3: Human Mortality Data, (a) Raw mean and (b) first-order singular value 
approximation. 

decrease in mortality that has continued until these days. The first-order approx- 
imation }jP^ is depicted in Fig. [3} together with the raw mean. We see that the 
approximation is very good, but some flaws are obvious. For example, newborn 
mortality (s = 0) remains constant over the years in Fig. [Mb) while it is obviously 
decreasing in Fig. [3](a) . 

The second component 4> 2 ( s ) CFig. [2Kc)) is mostly related to infant mortality, 
with $> 2 (A) CFig- EKd)) showing a steady decrease over the years except for the 
war period. Clearly, ffi' (Fig. @Kb)) provides a better fit for infant mortality than 
pS 1 '. The third-order approximation /t^ (Fig. (Mb)) improves the fit for the war 
years. Note that for this period, fi^ underestimates mortality for ages 20 to 30 
and overestimates it for ages 60 and over. Higher levels of detail could be added, 
but it is hard to see any features of the raw mean that have not been accounted for 
by£ (3) . 

An analysis of individual countries also reveals interesting facts. The scatter 
plot of the component scores (Fig. [6]) shows three points that stand apart from 
the rest. The most extreme case, having the smallest first-component score and the 
largest third-component score, is Finland. This is an unexpected result for someone 
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Figure 4: Human Mortality Data, (a) Second-order detail function and (b) second- 
order singular value approximation of the mean. 



Figure 5: Human Mortality Data, (a) Third-order detail function and (b) third- 
order singular value approximation of the mean. 
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Figure 6: Human Mortality Data. Individual component scores of the ten countries. 

unfamiliar with Finnish history, but it turns out that Finland was fighting on two 
different fronts during the war years. A quick comparison of the individual mortal- 
ity plots (shown in the Technical Report) reveals that Finland, indeed, experienced 
the largest increase in mortality rate for the 20-40 age bracket during the war years 
among the countries in this sample (this is precisely what a small first-component 
score accompanied by a large third-component score indicates, according to our 
interpretation of the components). 

The other two atypical points are Spain and Italy. Spain did not participate 
in the Second World War but went through a civil war in the 1930s, showing a 
different mortality pattern from the rest of the countries; in particular, the decrease 
in child mortality after 1945 was not as fast as for the other countries. Italy, by 
contrast, has the largest second-component score and is the country with the fastest 
post-war decrease in infant mortality. 

This example illustrates the kind of insight that can be gained from the func- 
tional singular value decomposition. While other methods (like tensor-product 
splines) can provide estimators of the mean function, the FSVD also offers an in- 
terpretable decomposition of the mean that can reveal interesting aspects of the 
data. 
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5 Simulations 



As mentioned before, we see the FSVD mainly as a tool for graphical and ex- 
ploratory data analysis, but since (0]) can be used as an estimator of /i, we ran 
a Monte Carlo study to compare its performance with that of tensor-product spline 
estimators. Specifically, we wanted to assess the ability of our free-knot component 
estimators to adapt to local features of /i, and the potential dangers of underesti- 
mating the approximation order p. 

We generated data from a mean-plus-error model Xi jk = jj,(sj,t k ) + e^. Two 
different means were considered, /x 1 (s, t) = Efc=i ^a/ 2 ^^)^^) an d /^(-M) = 
ELi >i /2 4> k (s)Mt)> with <f> k (s) = V2sm(2k7rs), ^ k (t) = yftaoaQkirt), Xi = I, 
A 2 = 1/2 and A 3 = 1/32. The grids {sj} and {t k } consisted of m = r equispaced 
points in [0, 1], and the errors e ijk were independent N(0, a 2 ). We considered two 
grid sizes, m = 20 and m = 30, two sample sizes, n — 10 and n = 50, and two error 
variances, a 2 = 1 and a 2 = 4. Each model was replicated 200 times (although not 
all combinations of factors were considered; see Table [D . 

For the tensor-product spline estimator, we took two bases of cubic B -splines 
with knots placed at the grid points. The estimator was regularized by penalizing 
the integrated squared partial derivatives, as explained in Hastie et al. (2001, ch. 
5) . The choice of a good smoothing parameter is crucial for the behavior of these 
estimators. To be as fair as possible with tensor-product splines, we chose the 
optimal smoothing parameter: the minimizer of \\jl — In practice this cannot 
be done because p is unknown, so the estimation errors reported in Table [D (under 
"TPS") will be lower than those attainable in practice. 

As FSVD estimator of fi we took a two-component decomposition, fi^, with 
4> k s and tp k s estimated by free-knot cubic splines, as explained in Section [31 Here 
the number of knots plays the role of smoothing parameter, so we considered two 
possibilities: a fixed number of knots (3 for <j> lt 5 for <f> 2 , 2 for ip 1 and 4 for ip 2 ), and 
an optimal number of knots (the number that minimizes \\4> k — (f) k \\ or \\-ip k — ip k \\, up 
to a maximum of 10 knots). The estimation errors are reported in Tabled] as "SVf" 
and "SVo", respectively. These two are extreme cases, so the actual estimation 
error of fi^ when the number of knots is selected by the user will fall somewhere 
between these two. 

Table [1] shows the root integrated squared errors, £' 1 / 2 (||/i — /i|| 2 ). Standard 
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Table 1: Simulation Results. Root mean integrated squared errors for tensor- 
product spline estimator (TPS) and FSVD estimators with fixed number of knots 
(SVf) and optimal number of knots (SVo). 

errors are not given, to avoid overcrowding the table, but all the differences are 
significant (the Technical Report shows boxplots of the simulated squared errors) . 
We see that for for which the order p of fi is correctly specified, the FSVD 
estimator with a fixed number of knots outperforms the tensor-product spline esti- 
mator in all situations but one (a — 1, m — 30, n — 50), while the FSVD estimator 
with optimal number of knots outperforms the tensor-product spline estimator in 
all situations (usually by a considerable margin) . 

For jj, 2 the situation reverses, as expected, since the order p is now underspeci- 
fied and then the bias does not vanish, even as m or n increase. Of course, it can be 
argued that p in practice is also chosen in a data-driven way: for large m and n, the 
estimators 3 and ^ 3 will be regular enough to call for a three-component estima- 
tor, which will make the FSVD estimator competitive again. The conclusion of this 
Monte Carlo study, then, is that FSVD estimators are competitive and even better 
than tensor-product splines as long as the number of components is not severely 
underspecified. Even if the estimated eigenvalues are small, for estimation pur- 
poses it is safer to include as many eigenfunctions as possible, as long as they are 
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not overwhelmed by noise. 



Acknowledgment 

This research was supported by the National Science Foundation, award number 
DMS 0604396. 

A Appendix 

The following proofs use functional analysis results that can be found, for instance, 
in Gohberg et al. (2003). Given fi e L 2 (§ x T), define the operator SDt : L 2 (1) ->■ 
L 2 (S) as (Wlf)(s) = f 7 n(s,t)f(t)dt. The adjoint of Wl is the operator Wl* : L 2 (S) ->■ 
L 2 (T) given by (Wl*g)(t) = f s n(s,t)g(s)ds. Let £1 = WlTl* and £ 2 = Tl*Tl. They 
are self-adjoint operators, £1 : L 2 (§>) — >■ L 2 (S) and£ 2 : £ 2 (T) — >■ £ 2 0D> with kernels 
fci(si,s 2 ) = J fx(s u t)fi(s 2 ,t)dt and k 2 (t u t 2 ) = f (j,(s,t 1 )(j,(s,t 2 )ds, respectively. 

Remember that for / e "K\ and g e !K 2 , the tensor-product operator g <g> / : 
IKi ->■ ^K 2 is defined as ® = (/, h)g. 

A. 1 Proof of Theorem 1 

Since R 2 is a self-adjoint integral operator, the spectral decomposition implies that 

£2 — Z] ■^feV'A; <8> V'fej where A& > and {^> fc } is an orthonormal system of eigenfunc- 
tions of & 2 , which can be completed to a basis of L 2 (7) by adding an orthonormal 
basis of ker(.ft 2 ), say {^ fc } (Gohberg et al., 2003, p. 180). This proves (2) of Theo- 
rem 1. Note that ker(£ 2 ) = ker(OJT): clearly ker(OJt) C ker(£ 2 ) because & 2 = 
but for any / e ker(£ 2 ), = (/, £ 2 /) = \\Wlf\\ 2 , which implies / e ker(9Jt) and then 
ker(£ 2 ) C ker(OJl). 

Now define <f) k = A^ 1/2 97l^ fe .The <f) k s are orthonormal in L 2 (S), since 

(hM = AT 1/2 A fc 1/2 (97l^,m) 

= A- 1/2 A- 1/2 (^-,^ fe ) = A7 1/2 A- 1/2 X k 5 jk . 

To prove (3) of Theorem 1, define the operator £ = \ l k /2 (j) k ®il) k . This operator 
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is well defined, since for any / e L 2 (7), we have £f = J2 \ k (ip k , f}4> k and 



Direct calculation shows that £ip k = Wlip k , and £ip k = 97fr/> fc = because ker(& 2 ) = 
ker(OJT). Since {i/; k } U {tp k } is a basis of L 2 (7), it follows that £ = Wl, which is (3) 
of Theorem 1 in different words. 

The identity (1) of Theorem 1 follows from (3), since ^ = WWJl*. In particular, 
this shows that the positive eigenvalues of £i are the same as those of 8. 2 , and the 
4> k s can be taken as the corresponding eigenfunctions. 

If the mean function /j,(s, t) is continuous, Mercer's Theorem (Gohberg et al., 
2003, p. 198) implies that the ip k s are continuous and k 2 satisfies (2) in Theorem 
1 in a pointwise manner, with the series converging absolutely and uniformly. 

The <f) k s are continuous by definition when fx is continuous. To prove that the 
identity (1) in Theorem 1 holds pointwise and that the series converges absolutely 
and uniformly, we essentially mimic the proof of Mercer's Theorem. See the Tech- 
nical Report for details. 

Finally, to show that expression (3) in Theorem 1 holds pointwise when the se- 
ries on the right-hand side converges absolutely and uniformly, note that both sides 
of expression (3) define the same operator from L 2 (7) to £ 2 (S), so the identity must 
hold almost everywhere, and by continuity, it must actually hold everywhere. ■ 

Remark. As by-products of the proof of Theorem 1 we obtain the identities 



||£/H 2 = ^A fe |(^,/)| 2 <||/|| 2 ^A fe <oo. 




and 




A. 2 Proof of Theorem 2 



Since {f k } and {g k } are orthonormal, 



p p 



fi-hf = \\n 



k=l k=l 
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which is minimized by a k = (g k , 9Jlf k ), k — 1, . . . ,p. Then, minimizing — h\\ 2 is 
equivalent to maximizing YX=i \{9k,%Rfk)\ 2 - By Cauchy-Schwartz inequality, 

J2\(9k,Wk)\ 2 < 52\\gk\\ 2 \\mfkf 

fe=l fc=l 

V P 

= ^|(OT/ fc ,ajt/ fc )| 2 = ^|(/ fc ,^ 2 / fc )| 2 . (7) 

k=l k=l 

It is well known (or see Gohberg et al., 2003, Section 4.9) that ([7]) is maximized by 
the leading p eigenfunctions of R 2 , and the maximum value is J2k=i ^k- Therefore 
ELi Kfl'fc) Wfe}| 2 < ELi A fc and equality holds for f k = tp k and g k = <p k , which 
completes the proof. ■ 



A. 3 Proof of Theorem 3 

Let z ijk = x ijk - ii(sj,tk), and define M = \^{sj,t k )\ jtk) , X; = [x ijk \ jtk) and Z* = 
[%'fe](i,fc)- Since f2 = B VKiVB and Ki= XUX T , we can write 

fW = ^(s) T VXUX T V^,(s) 

= /3,(s) T VM UM T V/^(s) (8) 

+2^(s) T VZUM T V^(s) (9) 

+/3,(s) T VZUZ T V/3,,(s). (10) 

We will show that ([8]) goes to Vt hh i as m and r go to infinity, and that ([9]) and (fTOl) 
go to zero in probability as n goes to infinity, uniformly in m and r. 
Since 

^(s) T VXUX T V^,(s) = 



3=1 j'=l I k=l 



it is clear that d8]) goes to Q hh > as m and r go to infinity, because both max vj and 
max u k go to zero as m and r go to infinity. 

With respect to ([9]), note that we can write it as 2y, with 

y t = f3 h (s) T VZ t VM^Vf3 h ,( S ). 
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The yiS are i.i.d. with E(y t ) = and V(y«) = V \j2T=i^l=i^h( s j) v j z ijkUkakh'j, 
with a fc/ j/ = X^=i v( s j'itk)vj>P h /(sj>). It can be proved that 



Km V(j/i) = /// / /^(siW^O/^^W^M^i^i), (s 2 , t 2 )}dsids 2 dtidt 2 , 



m— >oo 
r— >-oo 




where a h i(t k ) = J p(s,t k )(3 h ,(s)ds as m — > oo (see Technical Report). Then V(i/j) 
is bounded for any m and r, and a simple application of Tchebyshev's Inequality 
implies that ([9]) goes to zero in probability as n goes to infinity, uniformly in m and 
r. 

Regarding (fTOl) . note that 

/3,(s) T vzuz T v/3,,(s) < iiu^zTv^^ihiu^zTv^^s)!!. 

For a given index h, we can write U 1 / 2 Z T V/3 h (s) = w, with Wj = U 1/2 ZjV(3 h (s). 
The WjS are i.i.d. with E(wj) = and 

rfe,5Z V (^*) = /// Ph(si)^h(s2)p{(si,t) } (s 2 ,t)}ds 1 ds 2 dt 

r— >oo fc = i J J J 

(again, see Technical Report). Since E(||w|| 2 ) = n~ x Y7k=i V( w ik), a straightforward 
application of Markov's Inequality implies that ||w|| goes to zero in probability as n 
goes to infinity, uniformly in m and r, and consequently the same is true for (fTOl) . 
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